Statistical Corpus Analysis for Kt{treasure : Korea Telecom Train Ticket Reservation Aid System Based upon Speech Recognition
نویسنده
چکیده
This paper describes statistical analysis results of the corpus for KT{TREASURE (Korea Telecom Train ticket REservation Aid System based Upon speech REcognition). As the beginning of this development, two sets of speech corpus were collected. One was based on human-human(H-H) dialogues and the other was based on human-computer(H-C) dialogues. Wizard of Oz(WOZ) experiment was carried out to collect speech corpus based on H-C spoken dialogue. Linguistic analysis results show that people respond diierently when they are talking to a computer compared to when talking to a human. Since the basic unit of grammar in Korean is a morpheme, Korean-language model based on a morpheme was designed in addition to a word-based language model. We also deened the subword unit which lies between word and morpheme, then constructed a subword-based language model. Language-model analysis results reveal that a morpheme-based language model gives 50% reduction in perplexity(PP) over a word-based one. It also shows that a morpheme-based language model is least aaected by vocabulary reduction.
منابع مشابه
A Korean speech corpus for train ticket reservation aid system based on speech recognition
This paper describes the Korean speech corpus for train ticket reservation aid system based on speech recognition. Two sets of speech corpus were collected. One was based on human-human(H-H) dialogues and the other was based on human-computer(H-C) dialogues. WOZ(Wizard of Oz) experiment was carried out to collect speech corpus based on H-C spoken dialogue. A total of 298 speaker data was collec...
متن کاملKT-STS: a speech translation system for hotel reservation and a continuous speech recognition system for speech translation
In this paper, we present KT-STS(Korea Telecom Speech Translation System) and a continuous speech recognition system for speech translation. KT-STS is an experimental speech-to-speech translation system which translates a spoken utterance in Korean into one in Japanese. The system has been designed around the task of hotel reservation (dialogues between a Korean customer and a hotel reservation...
متن کاملTowards best practice in the development and evaluation of speech recognition components of a spoken language dialog system
Spoken Language Dialog Systems (SLDSs) aim to use natural spoken input for performing an information processing task such as call routing or train ticket reservation (Lamel et al., 1995). The main functionality of an SLDS are speech recognition, natural language understanding, dialog management, response generation and the speech synthesis. This article summarizes key aspects of the current pra...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997